Method and system for determining policy similarities

ABSTRACT

A method for determining similarity of two policies includes providing a first policy with n rules and a second policy with m rules, wherein each rule is structured into a plurality of identifiable elements, categorizing the rules in each policy based on an action, for each pair of rules finding those predicates whose attribute names match, computing an attribute similarity score for the attribute values, summing the attribute similarity scores for all pairs to obtain an element similarity score, and computing a rule similarity score for the pair of rules from a weighted sum of said element similarity scores.

TECHNICAL FIELD

This disclosure is directed to the comparison of security policies incollaborative computing applications.

DISCUSSION OF THE RELATED ART

The foundation of collaborative applications is the sharing ofresources, such as services, data, and knowledge. Such applications canhave different objectives, such as provisioning some complex service toa third party or performing collaborative data analysis, and may adoptdifferent collaboration mechanisms and tools. However, a commonrequirement is the need to assure security for shared resources. It isimportant that the collaboration does not undermine the security of thecollaborating parties and their resources. However, security should notdrastically reduce the benefits deriving from the collaboration byseverely restricting the access to the resources by the collaboratingparties. A question that a party P may need to answer when decidingwhether to share a resource with other parties is whether these otherparties guarantee the same level of security as P. This is a complexquestion and the first step to answering this question requires thecomparison of access control policies among resources. Access controlpolicies are security, privacy and system management policies stored insemi-structured form in computers. Access control policies govern accessto protecting resources by stating which subjects can access which datafor which operations and under which circumstances. Duringcollaborations, a party P may decide to release some data to a party P₀only if the access control policies of P₀ are very much the same as P'sown access control policies. Having P just sending its policies togetherwith data to P₀ so that P₀ can directly enforce these policies may notalways work. The evaluation of P's policies may require accessing someadditional data that may not be available to P₀ for various reasons, forexample, confidentiality, or P may not be willing to share its policieswith P₀.

More complex situations arise when several alternative resources andservices, each governed by its own independently administered accesscontrol policies, have been selected and combined in a complex service.In order to maximize the number of requests that can be satisfied by thecomplex service at the same time satisfying the access control policiesof each participating resource and service, it is desired to select forcombination the resources and services characterized by access controlpolicies that are similar. As an example consider the case of a gridcomputing system, consisting of data owners and resource owners, eachwith its own access control policies. For a subject to be able to run aquery on the data, this subject must verify both the access controlpolicy associated with the queried data and the access control policy ofthe resource to be used to process the query. It is often the case thatsuch parties do not have exactly the same access control policies;therefore in order to maximize the access to the data, it is importantto store the data for processing at the resource having access controlpolicies similar to the access control policies associated with thedata.

A trivial solution for computing policy similarity is represented by abrute force approach, that is, one simply evaluates both policies forany request and any assignment, and then compare the results. Thisapproach is inefficient and even infeasible when the request domain isinfinite.

Most current policy comparison work is performed manually since existingapproaches to policy similarity analysis are limited and based mainly onlogical reasoning and Boolean function comparison. Such approaches arecomputationally expensive and do not scale well for large heterogeneousdistributed environments. One practical approach based on model checkinganalyses role-based access-control policies written in the extensibleAccess Control Markup Language XACML. This approach represents policiesusing a multi-terminal binary decision diagram and is able to verifypolicy properties and analyze differences between versions of policies.Another algorithm for checking refinement privacy policies checks if onepolicy is a subset of another policy. Another category of relevant workis directed to policy conflict detection. One approach investigatesinteractions among policies and proposes a ratification tool by which anew policy is checked before being added to a set of policies. Thisapproach determines the satisfiability of Boolean expressionscorresponding to different policies. Another recent approach tocomputing policy similarity is limited to identifying policiesspecifying the same attribute.

SUMMARY OF THE INVENTION

Exemplary embodiments of the invention as described herein generallyinclude methods and systems for computing a policy similarity score fortwo policies. If the similarity score of policies P1 and P2 is higherthan that of policies P1 and P3, it means that P1 and P2 may yield thesame decisions to a larger common request set than P1 and P3. The policysimilarity measure can serve as a filter before applying any additionallogical reasoning or Boolean function comparison. It can provide auseful lightweight approach to pre-compile a list of policies and returnthe most similar policies for further exploration. Such explorationcould foe a fine-grained policy analysis which identifies the common ordiffering parts of two policies, and can also include a visualizationphase where users can identify the similar policies and make their owndecisions.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 depicts the structure of an XACML policy, according to anembodiment of the invention.

FIG. 2 is an exemplary data owner policy, according to an embodiment ofthe invention.

FIGS. 3-4 are exemplary resource owner policies, according to anembodiment of the invention.

FIG. 5 is a flowchart of a method for computing a Φ mapping, accordingto an embodiment of the invention.

FIG. 6 depicts an exemplary hierarchy, according to an embodiment of theinvention.

FIG. 7 depicts tables illustrating hierarchy codes for the hierarchy ofFIG. 6, according to an embodiment of the invention.

FIG. 8 depicts a table of similarity scores of 2 sets of attributes,according to an embodiment of the invention.

FIG. 9 is a flowchart of a method for computing a policy similaritymeasure, according to an embodiment of the invention.

DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS

Exemplary embodiments of the invention as described herein generallyinclude systems and methods for determining policy similarities.Accordingly, while the invention is susceptible to various modificationsand alternative forms, specific embodiments thereof are shown by way ofexample in the drawings and will herein be described in detail. Itshould be understood, however, that there is no intent to limit theinvention to the particular forms disclosed, but on the contrary, theinvention is to cover all modifications, equivalents, and alternativesfalling within the spirit and scope of the invention.

A method according to an embodiment of the invention can efficientlycompute a similarity score. Because of its generality a similaritymeasure can be designed for policies written in XACML (Extensible AccessControl Mark-up Language). The method here can be easily adapted tocover many other types of policies, written, for example, in P3P, or theImperial College policy language Ponder. The similarity measure takesinto account the policy structure and semantic information likeattribute hierarchies. Given two polices, the method for computing thesimilarity score first groups the same components of the two policies,and evaluates their similarity by using hierarchy distance and numericaldistance. Then the scores obtained for the different components of thepolicies are combined according to a weighted combination in order toproduce an overall similarity score.

FIG. 1 depicts the structure of an exemplary XACML policy, according toan embodiment of the invention. An XACML policy 100 includes three majorcomponents, namely a Target 110, a Rule set 120, and a rule combiningalgorithm 130 for conflict resolution. The Target 110 specifies somepredicates on the attribute values in a request, which must be held inorder for the policy to be applicable to the request. The attributes inthe Target element are categorized as Subject 111, Resource 112, andAction 113. These fields are optional, and not all fields areillustrated in the policy examples depicted, in FIGS. 2, 3, and 4. ARule set includes one or more Rules 121. Only one rule is illustrated inthe figure for clarity. Each Rule 121 in turn includes Target 122,Condition 126, and Effect 127 elements. The rule Target has the samestructure as the policy Target, and includes a Subject 123, Resource124, and Action 125. The only difference is that the rule Targetspecifies the situation when the rule can be applied. The policy targetcan be thought as a global target applying to each rule. A Condition 126element specifies some restrictions on request attribute values thatmust be satisfied in order to yield a Permit 128 or Deny 129 decision asspecified by the Effect 127 element. The policy similarity measuredescribed below is based on the comparison of each correspondingcomponent of the policies being compared. Here, the correspondingcomponent refers to policy targets and the same type of elementsbelonging to the rules with the same effect.

As an example that will be used in this disclosure, consider threepolicies P₁, P₂ and P₃, in the context of data and resource managementfor a grid computing system in a university domain. In particular, P₁ isa data owner policy depicted in FIG. 2, whereas P₂ and P₃ are resourceowner policies illustrated in FIGS. 3 and 4. Specifically, P₁ statesthat professors, postdocs, students and technical staff in an industryproject group are allowed to read or write source, documentation orexecutable files of size less than 100 MB. P₁ denies the writeoperations for postdocs, students and technical staff between 19:00 and21:00 hours because professors may want to check and make changes to theproject files without any distraction. P₂ is an access control policy ofa project machine. P₂ allows students, faculty and technical staff inthe industry project group to read or write files of size less than 120MB. P₂ gives a special permission to technical staff between time 19:00and 22:00 hours so that technical staff can carry out system maintenanceand backup files, and denies students the permission to write any filewhen technical staff is possibly working on maintenance. Moreover, P₂does not allow any user to operate on media files on the machine. P₃ isan access control policy for another machine, mainly used by businessstaff. P₃ states that only business staff in the group named “Payroll”can read or write .xls files of size less than 10 MB from 8:00 to 17:00hours, and it clearly denies students the access to the machine, FIGS.2, 3 and 4 report the XACML specification for these policies. It is tobe understood that these policies are exemplary and illustrative, andother embodiments of the invention are not limited to these policies orthe particular setting.

From a user's perspective, P₁ is more similar to P₂ than P₃ because mostactivities described by P₁ for the data owner are allowed by P₂. It isdesired to quickly compute similarity scores S₁ between P₁ and P₂, andS₂ between P₁ and P₃, where one would expect that S₁ be larger than S₂to indicate that the similarity between P₁ and P₂ is much higher thanthe similarity between P₁ and P₃.

The policy similarity measure between any two given policies shouldassign a similarity score that approximates the relationship between thesets of requests permitted (denied) by the two policies. The similarityscore is a value between 0 and 1, which reflects how similar these rulesare with respect to the targets they are applicable to and also withrespect to the conditions they impose on the requests. For example, in ascenario where a set of requests permitted (denied) by a policy P₁ is asubset of requests permitted (denied) by a policy P₂, the similarityscore for policies P₁ and P₂ must be higher than the score assigned in ascenario in which the set of requests permitted (denied) by P₁ and P₃have very few or no request in common.

Similarity Scores:

Given two policies P₁ and P₂, the rules in these policies are firstgrouped according to their effects, which results in a set of PermitRules (denoted as PR) and a set of Deny Rules (denoted as DR). Eachsingle rule in P₁ is then compared with a rule in P₂ that has the sameeffect, and a similarity score of two rules is obtained. The similarityscore obtained between the rules is then used to find one→many mappings(denoted as Φ) for each rule in the two policies. For clarity, fourseparate Φ mappings Φ₁ ^(P), Φ₁ ^(D), Φ₂ ^(P) and Φ₂ ^(D) are used. Themapping Φ₁ ^(P) (Φ₁ ^(D)) maps each PR(DR) rule r_(1i) in P₁ with one ormore PR(DR) rules r_(2j) in P₂. Similarly the mapping Φ₂ ^(P) (Φ₂ ^(D))maps each PR(DR) rule r_(2j) in P₂ with one or more PR(DR) rules r_(1i)in P₁. For each rule in a policy P₁(P₂), the Φ mappings give similarrules in P₂(P₁) which satisfy a certain similarity threshold. Thecomputation of the Φ mappings will be described in detail below.

By using the Φ mappings, one can compute the similarity score between arule and a policy. One can find how similar a rule is with respect tothe entire policy by comparing the single rule in one policy with a setof similar rules in the other policy. The notation rs_(1i)(rs_(2j))denotes the similarity score for a rule r_(1i)(r_(2j)) in policy P₁(P₂).The rule similarity score rs_(1i)(rs_(2j)) is the average of thesimilarity scores between a rule r_(1i)(r_(2j)) and the rules similar toit given by the Φ mapping. rs_(1i) and rs_(2j) are computed according tothe following expressions:

$\begin{matrix}{{rs}_{1i} = \left\{ \begin{matrix}{\frac{\sum\limits_{r_{j} \in {\Phi_{1}^{P}{(r_{1i})}}}{S_{rule}\left( {r_{1i},r_{j}} \right)}}{{\Phi_{1}^{P}\left( r_{1i} \right)}},{r_{1i} \in {PR}_{1}}} \\{\frac{\sum\limits_{r_{j} \in {\Phi_{1}^{D}{(r_{1i})}}}{S_{rule}\left( {r_{1i},r_{j}} \right)}}{{\Phi_{1}^{D}\left( r_{1i} \right)}},{r_{1i} \in {DR}_{1}},}\end{matrix} \right.} & (1) \\{{rs}_{2j} = \left\{ \begin{matrix}{\frac{\sum\limits_{r_{i} \in {\Phi_{2}^{P}{(r_{2j})}}}{S_{rule}\left( {r_{2j},r_{i}} \right)}}{{\Phi_{2}^{P}\left( r_{2j} \right)}},{r_{2j} \in {PR}_{2}}} \\{\frac{\sum\limits_{r_{i} \in {\Phi_{2}^{D}{(r_{2j})}}}{S_{rule}\left( {r_{2j},r_{i}} \right)}}{{\Phi_{2}^{D}\left( r_{2j} \right)}},{r_{2j} \in {DR}_{2}},}\end{matrix} \right.} & (2)\end{matrix}$where S_(rule) is a function that assigns a similarity score between tworules, and |Φ()| represents the cardinality of the particular set of Φmappings.

Next, the similarity score is computed between the permit (deny) rulesets PR₁(DR₁) and PR₂(DR₂) of policies P₁ and P₂ respectively. Thenotations S_(rule-set) ^(P) and S_(rule-set) ^(D) are used to denote thesimilarity scores for permit and deny rule sets respectively. Thesimilarity score for a permit(deny) rule set is obtained by averagingthe rule similarity scores (EQS. 1 and 2) for all rules in the set. Thepermit and deny rule set similarity scores are formulated as follows:

$\begin{matrix}{{S_{{rule} - {set}}^{P} = \frac{{\sum\limits_{i = 1}^{N_{{PR}_{1}}}{rs}_{1i}} + {\sum\limits_{i = 1}^{N_{{PR}_{2}}}{rs}_{2j}}}{N_{{PR}_{1}} + N_{{PR}_{2}}}},} & (3) \\{{S_{{rule} - {set}}^{D} = \frac{{\sum\limits_{i = 1}^{N_{{DR}_{1}}}{rs}_{1i}} + {\sum\limits_{i = 1}^{N_{{DR}_{2}}}{rs}_{2j}}}{N_{{DR}_{1}} + N_{{DR}_{2}}}},} & (4)\end{matrix}$where N_(PR) ₁ and N_(PR) ₂ are the numbers of rules in PR₁ and PR₂respectively, N_(DR) ₁ and N_(DR) ₂ are the numbers of rules in DR₁ andDR₂ respectively.

Finally, the similarity scores for permit and deny rule sets between thetwo policies are combined with a similarity score between the Targetelements of the two policies, to develop an overall similarity score,S_(policy) The formulation of S_(policy) is given by the followingequation:S _(policy)(P ₁ , P ₂)=w _(T) S _(T)(P ₁ , P ₂)+w _(P) S _(rule-set)^(P) +w _(d) S _(rule-set) ^(D),  (5)where S_(T) is a function that computes a similarity score between theTarget elements of any two given policies, and W_(T) is the associatedweight; w_(p) and w_(d) are weights that can be chosen to reflect therelative importance to be given to the similarity of permit and denyrule sets respectively. For normalisation, purpose, the weight valuesshould satisfy the constraint: w_(T)+w_(p)+w_(d)=1.

The intuition behind the similarity score assigned to any two policiesis derived from the fact that two policies are similar to one anotherwhen the corresponding policy elements are similar.

Computation of Φ Mappings:

The one→many Φ mappings determine for each PR(DR) rule in P₁(P₂) whichPR(DR) rules in P₂(P₁) are very similar. Intuitively, two rules aresimilar when their targets and the conditions they specify are similar.Thus a Φ mapping can be defined as follows:Φ(r _(i))={r _(j) |S _(rule)(r _(i) , r _(j))≧ε}  (6)where S_(rule) is computed by EQ. 7, below, and ε is a threshold. Thethreshold term, allows calibration of the quality of the similarityapproximation. It is expected that the actual value of the thresholdwill be very specific to the policy domain. FIG. 5 presents a flowchartsummarizing the procedure for calculating a Φ mapping. This proceduretakes as inputs two rule sets R′ and R″ and a threshold value ε, andcomputes a mapping for each rule in R′ based on EQ. 6. Referring to theFIG., the mapping algorithm starts at step 51 by providing the inputsR′, R″ and ε. Then, at step 52, for each rule r′εR′, the Φ mapping isinitialized to the empty set at step 53. Then, at step 54, for each ruler″εR″, if S_(rule)(r′,r″)≧ε, then the rule r″ is added to the Φ mapping:Φ(r′)=Φ(r′)∪{r″}. The method checks for more rules r″ at step 56, andfor more rules r′ at step 57. After all rules have been processed, the Φmapping is returned at step 58.Similarity Score between Rules:

Since the similarity measure serves as a lightweight filter phase, itshould not involve complicated analysis of boolean expressions. Thesimilarity measure is developed based on the intuition that rules r_(i)and r_(j) are similar when both apply to similar targets and bothspecify similar conditions on request attributes, i.e. they arestructurally similar. Specifically, the rule similarity functionS_(rule) between two rules r_(i) and r_(j) is computed as follows:S _(rule)(r _(i) , r _(j))=w _(t) S _(t)(r _(i) , r _(j))+w _(c) S_(c)(r _(i) , r _(j)),   (7)where w_(t) and w_(c) are weights that can be used for emphasizing theimportance of the target or condition similarity, respectively. Forexample, if users are more interested in finding policies applied tosimilar targets, they can increase w to achieve this purpose. Theweights satisfy the constraint w_(t)+w_(c)=1. S_(t) and S_(c) arefunctions that compute a similarity score between two rules based on thecomparison of their Target and Condition elements, respectively.

As the Target element in each rule contains the Subject, Resource andAction elements, each of these elements in turn contains predicates onthe respective category of attributes. Thus, the Target similarityfunction S_(t) is computed as follows:S _(t)(r _(i) , r _(j))=w _(s) S _(s)(r _(i) , r _(j))+w _(r) S _(r)(r_(i) , r _(j))+w _(a) S _(a)(r _(i) , r _(j)).   (8)In EQ. 8, w_(s), w_(r), w_(a) represent weights that are assigned to thecorresponding similarity scores. As in the previous equations, weightvalues need to satisfy the constraint w_(s)+w_(r)+w_(a)=1. S_(s), S_(r)and S_(a) are functions that return a similarity score based on theSubject, Resource and Action attribute predicates respectively in theTarget elements of the two given rules.

The computation of functions S_(c), S_(s), S_(r) and S_(a) involves thecomparison of pairs of predicates in the given pair of rule elements,which is discussed, in detail in the next subsection.

Similarity Score of Rule Elements:

Each of the rule elements Subject, Resource, Action and Condition isrepresented as a set of predicates in the form of{attr_name₁⊕₁attr_value₁,attr_name₂⊕₂attr_value₂,K}, where attr_namedenotes the attribute name, ⊕ denotes a comparison operator andattr_value represents an attribute value. It is assumed that there areno syntactic variations for the same attribute name. For example, therecannot exist attribute names “emp-name”, “EmpName” in different policiesall of which refer to the employee name attribute. The unification ofthe attribute names can be done using one of the many existingapproaches that have been developed for schema matching.

Eased on the type of attribute values, predicates are divided into twocategories, namely categorical predicates and numerical predicates.

-   -   Categorical predicate: The attribute values of this type of        predicate are categorical data that belong to some        domain-specific ontology. Predicates like        “Designation=Professor” and “FileType=Documentation” belong to        the categorical type.    -   Numerical predicate: The attribute values of this type of        predicate belong to integer, real, or date/time data types. For        example, predicates “FileSize<10 MB”, “Time=12:00” are of        numerical type.

The similarity score between two rules r_(i) and r_(j) regarding thesame element is denoted as S_(<Element>), where <Element> refers to ‘c’(condition), ‘s’ (subject), ‘r’ (resource) or ‘a’ (action). TheS_(<Element>) is computed by comparing the corresponding predicate setsin two rules. There are three steps. First, the predicates are clusteredfor each rule element according to the attribute names. It is worthnoting that one attribute name may be associated with multiple values.Second, one finds the predicates in the two rules whose attribute namesmatch exactly and then proceed to compute a similarity score for theirattribute values. The way similarity score are computed betweenattribute values differs, depending on whether the attribute value is ofcategorical type or numerical type (details of computation is covered inthe following subsection). Finally, the scores of each pair of matchingpredicates are summed to obtain the similarity score of the ruleelement. Since not ail attributes in one rule can find a match in theother, a penalty is included for this case by dividing the sum ofsimilarity scores of matching pairs by the maximum number of attributesin a rule. Note that a match can be a syntactic match, or can be asynonym discoverable in an electronic dictionary or an ontology.

In addition, there is a special case when the element set is empty inone rule, which means no constraint exists for this element. For thiscase, the similarity of the elements of the two rules is considered tobe 0.5 due to the consideration that one rule is a restriction of theother and the 0.5 is the estimation of the average similarity.

The formal definition of S_(<Element>) is given by EQ. 9:

$\begin{matrix}{{S_{\langle{Element}\rangle}\left( {r_{i},r_{j}} \right)} = \left\{ \begin{matrix}{\frac{\sum\limits_{{({a_{1k},a_{2l}})} \in M_{a}}{S\left( {a_{1k},a_{2l}} \right)}}{\max\left( {N_{a_{1}},N_{a_{2}}} \right)},} & {{N_{a_{1}} > {0\mspace{14mu}{and}\mspace{14mu} N_{a_{2}}} > 0},} \\{1,} & {{otherwise}.}\end{matrix} \right.} & (9)\end{matrix}$In EQ. 9, M_(a) is a set of pairs of matching predicates with the sameattribute names, a_(1k) and a_(2l) are attributes of rules r_(1i) andr_(2j) respectively, S_(<attr) _(—) _(typ>)is the similarity score ofattribute values of the type attr_typ, and N_(a1) and N_(a2) are thenumbers of distinct predicates in the two rules respectively.

In addition, the computation of the similarity score of two policytargets S_(T) is the same as that for the rule targets i.e., S_(t).

Similarity Score for Categorical Predicates:

For the categorical values, one should not only consider the exact matchof two values, but also consider their semantic similarity. For example,consider policy P₁ talking about the priority of professors, policy P₂talking about faculty members, and policy P₃ talking about businessstaff. In some sense, policy P₁ is more similar to policy P₂ than topolicy P₃ because “professors” is a subset of “faculty members” whichmeans that policy P₁ could be a restriction of policy P₂. Based on thisobservation, the approach assumes that a hierarchy relationship existsfor the categorical values.

The similarity between two categorical values (denoted as S_(cat)) isthen defined according to the shortest path of these two values in thehierarchy. The formal definition is shown below:

$\begin{matrix}{{{s_{cat}\left( {v_{1},v_{2}} \right)} = {1 - \frac{{SPath}\left( {v_{1},v_{2}} \right)}{2H}}},} & (10)\end{matrix}$where SPath (v₁, v₂) denotes the length of the shortest path between twovalues v₁ and v₂, and H is the height of the hierarchy. In EQ. 10, thelength of the shortest path of two values is normalized by the possiblemaximum path length which is 2H. The closer the two values are locatedin the hierarchy, the more similar the two values will be, and hence ahigher similarity score S_(cat) will be obtained.

FIG. 6 illustrates an example hierarchy, where each node represents acategorical value. A single tree graph represents the categorical valuesfor both attributes. The height of the hierarchy is 3, and the length ofmaximum path of two values is estimated as 2×3=6 (the actual maximumpath in the figure is 5 due to the imbalance of the hierarchy). TheSPath (E,B) is 1, and the SPath (E,F) is 2. According to EQ. 10, thesimilarity score of nodes E and B is 1−(⅙) =0.83, and the similarityscore of nodes E and F is 1−( 2/6)=0.67. From the obtained scores, onecan observe that E is more similar to B than to F. The underlying ideais that the parent-child relationship (B and E) implies one rule couldbe a restriction for the other and this would be more helpful than thesibling relationship (E and F) in rule integration.

To avoid repeatedly searching the hierarchy tree for the same valueduring the shortest path computation, each node is assigned a hierarchycode (Hcode), indicating the position of each node. In particular, theroot node is assigned an Hcode equal to ‘1’, and its children nodes arenamed in the order from left to right by appending their position to theparent's Hcode with a separator ‘.’, where there will be Hcodes like‘1.1’ and ‘1.2’. The process continues till the leaf level is reached.The number of elements separated by ‘.’ is equal to the level at which anode is located. From such Hcodes one can compute the length of shortestpath between two nodes. Two Hcodes are compared element by element untileither the end of one Hcode is reached or there is a difference. Thecommon elements correspond to the same shared parent nodes, and thenumber of different elements correspond to the levels that need to begeneralized to their common parent node. Therefore, the shortest path isthe total number of different elements in two Hcodes. For example, thelength of the shortest path from node ‘1.1’ to ‘1.2’ is 2, as there aretwo different elements in the Hcodes.

Note that the definition of S_(cat) can be applied to categorical valueswhich do not lie in a hierarchy. In that case, if two values arematched, their shortest path SPath is 0 and their similarity score willbe 1, otherwise, SPath is infinity and their similarity score becomes 0.

Having introduced the approach to compare two single values, thediscussion can be extended to two sets of values. Suppose there are twoattributes a₁:(v₁₁, v₁₂, v₁₃, v₁₄) and a₂:{v₂₁, v₂₂, v₂₃}, where a₁ anda₂ are the attribute names belonging to policy P₁ and P₂ respectively,and values in the brackets are corresponding attribute values. Note thatthe listed values belonging to the same attribute are different from oneanother. The similarity score of the two attribute value sets is the sumof similarity scores of pairs <v_(1k), v_(2l)> and a compensating scoreδ for non-matching attribute values. Obviously, there could be manycombinations of pairs. It is desired to find a set of pairs, denoted asM_(v), which nave the following properties:

-   -   1. If V_(1k)=V_(2l), then (V_(1k), V_(2l)) εM_(v).    -   2. For pairs v_(1k)≠v_(2l), pairs contributing to the maximum        sum of similarity scores belong to M_(v).    -   3. Each attribute value v_(1k) or v_(2l) occurs at most once in        M_(v).

The process of finding the pair set M_(v) is the following. First,obtain the hierarchy code for each attribute value. See FIG. 7 for anexample of these values for the example hierarchy shown in FIG. 6. Thencompute the similarity between pairs of attribute values with the helpof the hierarchy code.

FIG. 8 shows the resulting scores for the example. Next, pick out theexactly matched pairs, which are <v₁₁, v₂₁> end <v₁₄, v₂₃> in theexample. For the remaining attribute values, find pairs that maximizethe sum of similarity scores of pairs. In this example, <V₁₂, V₂₂ hasthe same similarity score as <v₁₃, v₂₂>, and hence one needs to furtherconsider which choice can lead to a greater compensating score. Thecompensating score δ is for attribute values which do not have matcheswhen two attributes have a different number of values. δ is computed asthe average similarity score between unmatched values with all thevalues of the other attribute. For this example, no matter which pair ischosen, the compensating score is the same. Suppose the pair <v₁₂, v₂₂>is chosen. This leaves one value v₁₃ is left whose compensating score δis (0.33+0.67 +0.17)/3=0.39.

Finally, the similarity score for the two attribute a₁ and a₂ takes intoaccount both the similarity of attribute names and attribute values.Specifically, the similarity score for attribute names is 1 since theyare exactly matched, and the similarity score for attribute values isthe average of the scores of the pairs and the compensating score. Thefinal score is (½)[1+(1+1+0.67+0.39)/4]=0.88.

The similarity score of two categorical predicates is defined asfollows:

$\begin{matrix}{{{S_{cat}\left( {a_{1},a_{2}} \right)} = {\frac{1}{2}\left\lbrack {1 + \frac{{\sum\limits_{{({v_{1k},v_{2l}})} \in M_{v}}{s_{cat}\left( {v_{1k},v_{2l}} \right)}} + \delta}{\max\left( {N_{v_{1}},N_{v_{2}}} \right)}} \right\rbrack}},} & (11) \\{\delta = \left\{ {\begin{matrix}{\frac{\sum\limits_{{({v_{1k},\_})} \notin M_{v}}{\sum\limits_{l = 1}^{N_{v_{2}}}{s_{cat}\left( {v_{1k},v_{2l}} \right)}}}{N_{v_{2}}},} & {{N_{v_{1}} > N_{v_{2}}},} \\{\frac{\sum\limits_{{({\_,v_{2l}})} \notin M_{v}}{\sum\limits_{k = 1}^{N_{v_{1}}}{s_{cat}\left( {v_{1k},v_{2l}} \right)}}}{N_{v_{1}}},} & {{N_{v_{2}} > N_{v_{1}}},}\end{matrix},} \right.} & (12)\end{matrix}$where N_(v) ₁ and N_(v) ₂ are the total numbers of values associatedwith attributes a₁ and a₂ respectively.Similarity Score for Numerical Predicates:

Unlike categorical values, numerical values do not have any hierarchicalrelationship. For computation efficiency, the similarity of twonumerical values v₁ and v₂ is defined based on their difference as shownin EQ. 13:

$\begin{matrix}{{s_{num}\left( {v_{1},v_{2}} \right)} = {1 - {\frac{{v_{1} - v_{2}}}{\max\left( {v_{1},v_{2}} \right)}.}}} & (13)\end{matrix}$The s_(num) tends to be large when the difference between two values issmall.

The computation of the similarity score of two numerical value sets issimilar to that for the two categorical value sets, and there is thusthe following similarity definition for numerical predicates:

$\begin{matrix}{{{s_{num}\left( {a_{1},a_{2}} \right)} = {\frac{1}{2}\left\lbrack {1 + \frac{{\sum\limits_{{({v_{1k},v_{2l}})} \in M_{v}}{s_{num}\left( {v_{1k},v_{2l}} \right)}} + \delta}{\max\left( {N_{v_{1}},N_{v_{2}}} \right)}} \right\rbrack}},} & (14) \\{\delta = \left\{ \begin{matrix}{\frac{\sum\limits_{{({v_{1k},\_})} \notin M_{v}}{\sum\limits_{l = 1}^{N_{v_{2}}}{s_{num}\left( {v_{1k},v_{2l}} \right)}}}{N_{v_{2}}},{N_{v_{1}} > N_{v_{2}}},} \\{\frac{\sum\limits_{{({\_,v_{2l}})} \notin M_{v}}{\sum\limits_{l = 1}^{N_{v_{1}}}{s_{num}\left( {v_{1k},v_{2l}} \right)}}}{N_{v_{1}}},{N_{v_{2}} > {N_{v_{1}}.}}}\end{matrix} \right.} & (15)\end{matrix}$Overall Algorithm:

The steps involved in the computation of a similarity score between twopolicies P₁ and P₂ are illustrated in the flowchart of FIG. 9. Referringto FIG., the algorithm takes as arguments at step 91 a policy P₁ with nrules {r₁₁, r₁₂, . . . , r_(1n)}, and a policy P₂ with m rules {r₂₁,r₂₂, . . . , r_(2m)}.

The algorithm includes five phases. In a first phase, the rules in P₁and P₂ are categorized at step 92 based on their effects as eitherpermit or deny rules.

Second, the similarity score S_(rule) is computed for each pair of rulesin P₁ and P₂, where S_(rule) is defined by EQ. (7). The similarity ofeach permit rule of P₁ with each permit rule of P₂ is calculated at step93, and the similarity of each deny rule of P₁ with each deny rule of P₂is calculated at step 94.

In the third phase, based on the S_(rule), the Φ mappings Φ₁ ^(P), Φ₁^(D), Φ₂ ^(P) and Φ₂ ^(D) are computed at step 95. The functionComputePhiMapping is illustrated in the flow chart of FIG. 5 thatimplements EQ. 6.

Fourth, the Φ mappings are used to calculate the rule set similarityscores. At step 96, the rule similarity score is computed for each rulein P₁ with the rules in Φ₁ ^(P) and Φ₁ ^(D). At step 97, the rulesimilarity score is computed for each rule in P₂ with the rules in Φ₂^(P) and Φ₂ ^(D). The function ComputeRuleSimilarity is defined by EQ.(1) for the rules in policy P₁, step 96, and by EQ. (2) for the rules inpolicy P₂, step 97. The overall rule similarity scores for the permitrules and deny rules are calculated at step 98 by averaging therespective rule similarity scores. The similarity of the permit rules isdefined by EQ. (3), and the similarity of the deny rules is defined byEQ. (4). Finally, in the fifth phase, the overall similarity score isobtained at step 99 by weighting the similarities of the permit and denyrules with the Target similarity score, as defined by EQ. (5). TheTarget similarity can be calculated similarly to that of the Targetelements of two rules, as defined by EQ. (8).

The most computationally expensive part of the algorithm is thecomputation of S_(rule). S_(rule) is the sum of similarity scores ofcorresponding elements. Suppose the average number of attributes in oneelement is n_(a). To find matching attributes with the same name, ittakes O(n_(a) log n _(a) ) to sort and compare the list of attributenames. For each pair of matching attributes, compute the similarityscores of attribute values. Generally speaking, one attribute name isassociated with one or very few number of values (e.g. 10). Therefore,the time for the attribute value computation can be estimated to be aconstant time c. Then the complexity of computing a similarity score oftwo elements is O(n_(a) log n_(s)+n_(a) C). For each rule, there are atmost 5 elements, and the computation complexity of S_(rule) is stillO(n_(a) log n_(a)). This is not the only similarity measure possible.Other comparisons can be made. For example, instead of matching exactattribute names one can use a dictionary to look for synonyms. This willadd little cost to the computation. In other words, extensions orchanges to the similarity measure, as long as they are not more the O(nlog n), can be added. Other techniques, like synonyms matching frominformation retrieval could be incorporated into the measure.

It is to be understood, however, that a policy similarity measureaccording to an embodiment of the invention does not require the rulesto be restricted to “permit” and “deny” rules, but can apply to any kindof rule that is structured into identifiable sections where the samesections in two rules are compared with a similarity measure. Forexample, if rules have the form WHEN Event IF Condition THEN Action, thesame methodology can be applied, i.e. for two rules WHEN Event1 IFCondition1 THEN Action1 and WHEN Event2 IF Condition2 THEN Action2,similarity measure can foe defined to get S_(a)(Event1,Event2),S_(c)(Condition1, Condition2) if Action1=Action2. Instead of performingan exact comparison, one can create a similarity measure to establishsimilarities. One can also create a similarity measure between actionsso that when the difference between the actions is not clear cut like inthe case of Permit/Deny a comparison can still be performed. Forexample, if there is a policy that says: IF Response time of ApplicationA is <10 millsec in the next hour THEN Add servers to serve ApplicationA, it can be compared with a policy that says: IF subscribers ofApplication A are more than 1000 this month THEN Add resources toApplication A. Thus, although the actions are not the same, they aresimilar, in that they allocate more resources. Note also that these aresystem management policies, so a policy similarity measure according toan embodiment of the invention applies not only to security policies butto also to system management and administration management polices.

Case Study

In this section is provided an example to illustrate how a policysimilarity measure algorithm according to an embodiment of the inventionworks. Continuing with the policy examples P₁, P₂ and P₃ introducedabove, the policy similarity algorithm assigns a similarity scorebetween these policies. Furthermore, the similarity algorithm assigns ahigher similarity score between the data owner policy P₁ and resourceowner policy P₂ than between the data owner policy P₁ and resource ownerpolicy P₃, adequately representing the relationship between the sets ofrequests permitted(denied) by the corresponding policies. Thus, usingthe similarity score computed by this algorithm according to anembodiment of the invention, the data owner can decide to host his/herdata at the resource owner with policy P₂, which is more compatible toits own policy.

In the following discussion reference is made to the policies shown inFIGS. 2, 3 and 4. Without having any additional knowledge of theapplication, it can be assumed that each rule component has the sameimportance and has an equal weight in all computations.

The similarity score between P₁ and P₂ is calculated as follows.

1. The rules in P₁ and P₂ are categorized based on their effects to findthe permit and deny sets, PR₁ (PR₂) and DR₁ (DR₂). These sets are:

-   -   PR₁={R11};    -   PR₂={R21, R22};    -   DR₁={R12};    -   DR₂={R23, R24}.

2. The rule similarity scores is computed between pairs of rules in bothpolicies:

-   -   S(R11, R21)=0.81;    -   S(R11, R22)=0.56;    -   S(R12, R23)=0.81;    -   S(R12, R24)=0.76.

3. For policy P1, find the Φ mappings Φ₁ ^(P) and Φ₁ ^(D) using theComputePhiMapping procedure. Using 0.7 as the threshold value for thisexample when computing the mappings, the Φ mappings obtained for policyP₁ are as follows:Φ₁ ^(P) ={R11→{R21}},Φ₁ ^(D) ={R12→{R23, R24}}.

4. The Φ mappings Φ₂ ^(P) and Φ₂ ^(D) are calculated similarly forpolicy P2.Φ₂ ^(P) ={R21→{R11}, R22→{ }},Φ₂ ^(D) ={R23→{R12}, R24→{R12}}.

5. For each rule in r_(1i) in P₁ the rule similarity score rs_(1i) iscomputed:

rs₁₁ = S_(rule)(R 11, R 21) = 0.81;${rs}_{12} = {{\frac{1}{2}\left\lbrack {{S_{rule}\left( {{R\; 12},{R\; 23}} \right)} + {S_{rule}\left( {{R\; 12},{R\; 24}} \right)}} \right\rbrack} = {0.79.}}$

6. Similarly, for each rule r_(2j) the rule similarity score rs_(2j) iscomputed:rs ₂₁ =S _(rule)(R21, R11)=0.81;rs ₂₂=0;rs23=S _(rule)(R23, R12)=0.81;rs24=0.76.

7. The similarity between the permit rule sets of P₁ and P₂, given byS_(rule-set) ^(P), is computed:

$\begin{matrix}{S_{{rule} - {ref}}^{P} = \frac{{rs}_{11} + {rs}_{21} + {rs}_{22}}{3}} \\{= \frac{0.81 - 0.81 + 0.0}{3}} \\{= {0.54.}}\end{matrix}$

8. The similarity between the deny rule sets of P₁ and P₂, given byS_(rule-set) ^(D), is computed:

$\begin{matrix}{S_{{rule} - {ref}}^{D} = \frac{{rs}_{12} + {rs}_{23} + r_{24}}{3}} \\{= \frac{0.79 + 0.81 + 0.76}{3}} \\{= {0.79.}}\end{matrix}$

9. Finally the permit and deny rule set similarities and policy targetsimilarities are combined to obtain the overall policy similarity scoreS₁, between policies P₁ and P₂;

$\begin{matrix}{{S_{policy}\left( {P_{1},P_{2}} \right)} = {{\frac{1}{3}S_{T}} + {\frac{1}{3}S_{{rule} - {ref}}^{P}} + {\frac{1}{3}S_{{rule} - {ref}}^{D}}}} \\{= {{\frac{1}{3} \cdot 0.75} + {\frac{1}{3} \cdot 0.54} + {\frac{1}{3} \cdot 0.79}}} \\{= {0.71.}}\end{matrix}$The policy similarity score is then calculated for polices P₁ and P₃.The policy target similarity score S_(T)=0.5. The rule similarity scoresfor policies P₁ and P₃ are;S(R11, R21)=0.7;S(R12, R23)=0.66.By using the threshold 0.7, the following Φ mappings are obtained:Φ₁ ^(P) ={R11→{R31}},Φ₁ ^(D) ={R12→{ }}.Following the same steps as above, one can compute a policy similarityscore S₂ between P₁ and P₃.

$\begin{matrix}{{S_{policy}\left( {P_{1},P_{3}} \right)} = {{\frac{1}{3}S_{T}} + {\frac{1}{3}S_{{rule} - {ref}}^{P}} + {\frac{1}{3}S_{{rule} - {ref}}^{D}}}} \\{= {{\frac{1}{3} \cdot 0.5} + {\frac{1}{3} \cdot 0.7} + {\frac{1}{3} \cdot 0.0}}} \\{= {0.4.}}\end{matrix}$

Observe that policy P₁ is clearly more similar to policy P₂ whencompared to policy P₃. Hence the data owner would choose to maintaindata on the resource owner with policy P₂.

System Implementation

It is to foe understood that the present invention can be implemented invarious forms of hardware, software, firmware, special purposeprocesses, or a combination thereof. In one embodiment, the presentinvention can be implemented in software as an application programtangible embodied on a computer readable program storage device. Theapplication program can be uploaded to, and executed by, a machinecomprising any suitable architecture.

While the present invention has been described in detail with referenceto a preferred embodiment, those skilled in the art will appreciate thatvarious modifications and substitutions can be made thereto withoutdeparting from the spirit and scope of the invention as set forth in theappended claims.

1. A computer-implemented method for determining similarity of twoaccess control policies, the method performed by the computer comprisingthe steps of: providing a first policy with n rules; providing a secondpolicy with m rules; categorizing the rules in each policy based oneffect, wherein said rules are categorized as either permit rules ordeny rules; calculating a rule similarity score for each permit rule insaid first policy with each permit rule in said second policy;calculating a rule similarity score for each deny rule in said firstpolicy with each deny rule in said second policy; for each rule in eachpolicy, calculating a rule-set similarity score between said rule andthe rules of similar effect in the other policy; averaging the rule-setsimilarity scores for all permit rules; averaging the rule-setsimilarity scores for all deny rules; and calculating a policysimilarity score from a weighted sum of the average permit rule-setsimilarity score and the average deny rule-set similarity score, whereinsaid policy similarity score is indicative of the similarity of saidfirst and second policies, wherein each said policy is a set of rulesfor determining access to and use of resources in an information system.2. The method of claim 1, further comprising, for each rule in eachpolicy, collecting in a phi mapping those rules of similar effect in theother policy, wherein said rule-set similarity scores are computedbetween a rule and those rules in its phi mapping.
 3. The method ofclaim 2, wherein the phi mapping Φ(r_(i)) for a rule r_(i) in one effectcategory of one policy is calculated fromΦ(r _(i))={r _(j) |S _(rule)(r _(i) , r _(j))≧ε}, for each r_(j) in thesame effect category of the other policy, wherein S_(rule)(r_(i),r_(j))represents the rule similarity score for rules r_(i), r_(j), and ε is apredetermined threshold.
 4. The method of claim 2, wherein a rule-setsimilarity score between a rule r_(1i), in one policy and other rulesr_(j) of similar effect in said other policy is calculated from$\frac{\sum\limits_{r_{j} \in {\Phi_{1}^{E}{(r_{1i})}}}{S_{rule}\left( {r_{1i},r_{j}} \right)}}{{\Phi_{1}^{E}\left( r_{1i} \right)}},$wherein S_(rule)(r_(1i),r_(j)) is said rule similarity score, Φ₁^(E)(r_(1i)) is the phi mapping for rule r_(1i) to other rules of effectE in the other policy, wherein the sum is over all rules r_(j) in Φ₁^(E)(r_(1i)), and |Φ₁ ^(E)(r_(1i))| is the cardinality of Φ₁^(E)(r_(1i)).
 5. The method of claim 1, wherein each policy comprises atarget that includes a subject, a resource, and an action, wherein themethod further comprises adding a weighted target similarity scorebetween target elements of said first and second policies.
 6. The methodof claim 1, wherein each rule comprises a plurality of elements,including a subject element, a resource element, an action element, anda condition element, wherein each element is represented as a set ofpredicates in the form of {attr_name₁, ⊕₁attr_value₁,attr_name₂⊕₂attr_value₂, . . . }, where attr_name denotes an attributename, ⊕ denotes a comparison operator and attr_value represents anattribute value, wherein attribute values include categorical values ornumerical values.
 7. The method of claim 6, wherein calculating a rulesimilarity score between a rule in said first policy and a rule in saidsecond policy comprises: finding those predicates in said two ruleswhose attribute names match, wherein a match is either a syntactic matchor a synonym; for each predicate with matching attribute names,computing an attribute similarity score for the attribute values; andsumming the attribute similarity scores for all pairs of matchingpredicates to obtain an element similarity score.
 8. The method of claim7, wherein if an element set is empty for one rule, setting the elementsimilarity score to 0.5.
 9. The method of claim 7, wherein said rulesimilarity score between rule r_(i) in said first policy and rule r_(j)in said second policy is a weighted sum of said element similarityscores, equivalent to the expressionw _(s) S _(s)(r _(i) , r _(j))+w _(r) S _(r)(r _(i) , r _(j))+w _(a) S_(a)(r _(i) , r _(j))+w _(c) S _(c)(r _(i) , r _(j)), wherein S_(s),S_(r), S_(a), and S_(c) are the element similarity scores for thesubject, resource, action and condition elements, respectively, andw_(s), w_(r), w_(a), and w_(c) are the respective weights, wherein a sumof the weights is one.
 10. The method of claim 7, wherein calculating anattribute similarity score between two attributes, a first attributeassociated with said first rule, and a second attribute associated withsaid second rule, each attribute having associated attribute value sets,comprises summing a similarity score for all attribute pairs {v_(1k),v₂₁}, wherein v_(1k) is associated with said first attribute and v₂₁ isassociated with said second attribute, and a compensating score for nonmatching attribute values.
 11. The method of claim 10, wherein anattribute similarity score for numerical attribute values v_(1k), v₂₁ isdefined by an expression equivalent to${\frac{1}{2}\left\lbrack {1 + \frac{{\sum\limits_{{({v_{1k},v_{2l}})} \in M_{v}}{s_{num}\left( {v_{1k},v_{2l}} \right)}} + \delta}{\max\left( {N_{v_{1}},N_{v_{2}}} \right)}} \right\rbrack},{wherein}$${{s_{num}\left( {v_{1},v_{2}} \right)} = {1 - \frac{{v_{1} - v_{2}}}{\max\left( {v_{1},v_{2}} \right)}}},$and δ is the compensating score defined as$\delta = \left\{ \begin{matrix}{\frac{\sum\limits_{{({v_{1k},\_})} \notin M_{v}}{\sum\limits_{l = 1}^{N_{v_{2}}}{s_{num}\left( {v_{1k},v_{2l}} \right)}}}{N_{v_{2}}},{N_{v_{1}} > N_{v_{2}}},} \\{\frac{\sum\limits_{{({\_,v_{2l}})} \notin M_{v}}{\sum\limits_{l = 1}^{N_{v_{1}}}{s_{num}\left( {v_{1k},v_{2l}} \right)}}}{N_{v_{1}}},{N_{v_{2}} > {N_{v_{1}}.}}}\end{matrix} \right.$ wherein N_(v) ₁ and N_(v) ₂ are the total numberof values associated with said first and second attributes,respectively, and M_(v) is a set of pairs of matched attribute values.12. The method of claim 10, wherein calculating an attribute similarityscore for attributes a₁, a₂ having categorical attribute values v_(1k),v₂₁ comprises: representing the categorical attribute values of bothattributes in a single hierarchical tree graph, wherein each node ofsaid tree represents a categorical value; representing each node with ahierarchy code indicative of the position of the node within the tree;computing a similarity score between each pair of attribute valuesv_(1k), v₂₁ wherein v_(1k) is an attribute of a₁ and v₂₁ is an attributeof a₂; summing similarity scores for those attribute value pairs withmatching values; summing similarity scores for the remaining attributevalue pairs that maximize said sum of pair similarity scores; adding acompensating score for unmatched attribute values, wherein saidcompensating score is an average of similarity scores between unmatchedvalues with all other attribute values; adding a similarity score forattribute names, wherein said similarity score for attribute values isan average of pair similarity scores and the compensating score.
 13. Themethod of claim 12, wherein a similarity score between a pair ofattribute values v₁, v₂ is computed from${1 - \frac{{SPath}\left( {v_{1},v_{2}} \right)}{2H}},$ where SPath(v₁,v₂) denotes the length of a shortest path between values v₁ and v₂, andH is a height of the hierarchy.
 14. The method of claim 12, wherein saidhierarchy code is defined with a root node assigned a code equal to ‘1’,and child nodes are coded in order from left to right by appending theirposition to the parent's code with a separator ‘.’.
 15. Acomputer-implemented method for determining similarity of two accesscontrol policies, the method performed by the computer comprising thesteps of: providing a first policy with n rules and a second policy withm rules, wherein each rule comprises a plurality of elements, includinga subject element, a resource element, an action element, and acondition element that determines an effect of said rule, wherein eachelement is represented as a set of predicates in the form of{attr_name₁⊕₁attr_value₁,attr_name₂⊕₂attr_value₂, . . . }, whereattr_name denotes an attribute name, ⊕ denotes a comparison operator andattr_value represents an attribute value, wherein attribute valuesinclude categorical values or numerical values; categorizing the rulesin each policy based on effect, wherein said rules are categorized aseither permit rules or deny rules; for each rule r_(i), in said firstpolicy and each rule r_(j), in said second policy of similar effect,finding those predicates in said pair of rules whose attribute namesmatch, wherein a match is either a syntactic match or a synonym; foreach predicate with matching attribute names, computing an attributesimilarity score for the attribute values; summing the attributesimilarity scores for all pairs of matching predicates to obtain anelement similarity score; and computing a rule similarity scoreS_(rule)(r_(i),r_(j)) for said pair of rules from a weighted sum of saidelement similarity scores, wherein said rule similarity score isindicative of the similarity of said first and second policies, whereineach said policy is a set of rules for determining access to and use ofresources in an information system.
 16. The method of claim 15, furthercomprising: for each rule in each policy, calculating a rule-setsimilarity score between said rule and the rules of similar effect inthe other policy; averaging the rule-set similarity scores for allpermit rules; averaging the rule-set similarity scores for all denyrules; and calculating a policy similarity score from a weighted sum ofthe average permit rule-set similarity score and the average denyrule-set similarity score.
 17. The method of claim 15, wherein saidweighted sum of said element similarity scores is equivalent to theexpressionw _(s) S _(s)(r _(i) , r _(j))+w _(r) S _(r)(r _(i) , r _(j))+w _(a) S_(a)(r _(i) , r _(j))+w _(c) S _(c)(r _(i) , r _(j)), wherein S_(s),S_(r), S_(a), and S_(c) are the element similarity scores for thesubject, resource, action and condition elements, respectively, andw_(s), w_(r), w_(a), and w_(c) are the respective weights, wherein a sumof the weights is one.
 18. The method of claim 15, wherein a ruleelement similarity score between rules r_(i) and r_(j) is calculatedfrom $\left\{ \begin{matrix}{\frac{\sum\limits_{{({a_{1k},a_{2l}})} \in {Ma}}{S\left( {a_{1k},a_{2l}} \right)}}{\max\left( {N_{a_{1}},N_{a_{2}}} \right)},} & {{N_{a_{1}} > {0\mspace{14mu}{and}\mspace{14mu} N_{a_{2}}} > 0},} \\{1,} & {{otherwise},}\end{matrix}\quad \right.$ wherein M_(a) is a set of pairs of matchingpredicates with the same attribute names, a_(1k) and a₂₁ are attributesof rules r_(1i) and r_(2j) respectively, S(a_(1k),a₂₁) is the similarityscore of attribute values of the attributes a_(1k) and a₂₁, and N_(a) ₁and N_(a) ₂ , are the numbers of distinct predicates in said two rulesrespectively.
 19. The method of claim 18, wherein an attributesimilarity score for numerical attribute values v_(1k), v₂₁ is definedby an expression equivalent to${\frac{1}{2}\left\lbrack {1 + \frac{{\sum\limits_{{({v_{1k},v_{2l}})} \in M_{v}}{s_{num}\left( {v_{1k},v_{2l}} \right)}} + \delta}{\max\left( {N_{v_{1}},N_{v_{2}}} \right)}} \right\rbrack},{wherein}$${{s_{num}\left( {v_{1},v_{2}} \right)} = {1 - \frac{{v_{1} - v_{2}}}{\max\left( {v_{1},v_{2}} \right)}}},$and δ is a compensating score for non matching attribute values definedas $\delta = \left\{ \begin{matrix}{\frac{\sum\limits_{{({v_{1k},\_})} \notin M_{v}}{\sum\limits_{l = 1}^{N_{v_{2}}}{s_{num}\left( {v_{1k},v_{2l}} \right)}}}{N_{v_{2}}},{N_{v_{1}} > N_{v_{2}}},} \\{\frac{\sum\limits_{{({\_,v_{2l}})} \notin M_{v}}{\sum\limits_{l = 1}^{N_{v_{1}}}{s_{num}\left( {v_{1k},v_{2l}} \right)}}}{N_{v_{1}}},{N_{v_{2}} > {N_{v_{1}}.}}}\end{matrix} \right.$ wherein N_(v) ₁ and N_(v) ₂ are the total numberof values associated with said first and second attributes,respectively, and M_(v) is a set of pairs of matched attribute values.20. The method of claim 18, wherein an attribute similarity score forattributes a₁, a₂ with categorical attribute values v_(1k), v₂₁ isdefined by an expression equivalent to${{S_{cat}\left( {a_{1},a_{2}} \right)} = {\frac{1}{2}\left\lbrack {1 + \frac{{\sum\limits_{{({v_{1k},v_{2l}})} \in M_{v}}{S_{cat}\left( {v_{1k},v_{2l}} \right)}} + \delta}{\max\left( {N_{v_{1}},N_{v_{2}}} \right)}} \right\rbrack}},{wherein}$${{s_{cat}\left( {v_{1},v_{2}} \right)} = {1 - \frac{{SPath}\left( {v_{1},v_{2}} \right)}{2H}}},$where SPath(v₁, v₂) denotes the length of a shortest path between twovalues v₁ and v₂ in a hierarchical tree graph representing thecategorical attribute values of both attributes wherein each node ofsaid tree represents a categorical value, and H is the height of thehierarchy, and δ is a compensating score for non matching attributevalues defined as $\delta = \left\{ {\begin{matrix}{\frac{\sum\limits_{{({v_{1k},\_})} \notin M_{v}}{\sum\limits_{l = 1}^{N_{v_{2}}}{S_{cat}\left( {v_{1k},v_{2l}} \right)}}}{N_{v_{2}}},{N_{v_{1}} > N_{v_{2}}},} \\{\frac{\sum\limits_{{({\_,v_{2l}})} \notin M_{v}}{\sum\limits_{l = 1}^{N_{v_{1}}}{S_{cat}\left( {v_{1k},v_{2l}} \right)}}}{N_{v_{1}}},{N_{v_{2}} > N_{v_{1}}},}\end{matrix},} \right.$ where N_(v) ₁ and N_(v) ₂ are the total numbersof values associated with attributes a₁ and a₂ respectively, and M_(v)is a set of pairs of matched attribute values.
 21. The method of claim20, further comprising representing each node with a hierarchy codeindicative of the position of the node within the tree, wherein a rootnode of said tree is assigned a code equal to ‘1’, and child nodes areassigned codes in order from left to right by appending their positionto the parent's code with a separator ‘.’, and wherein a shortest pathbetween two attribute values is a total number of different elements inthe corresponding hierarchy codes.
 22. The method of claim 20, whereinsaid set M_(v), includes those attribute value pairs (v_(1k), v₂₁)wherein v_(1k)=v₂₁ and those attribute value pairs (v_(1k), v₂₁) whereinv_(1k)≠v₂₁ whose attribute value similarity score maximizes a sum ofattribute value similarity scores, wherein each attribute value v_(1k),v₂₁ occurs at most once in M_(v).
 23. The method of claim 15, furthercomprising, for each rule in each policy, collecting in a phi mappingthose rules of similar effect in the other policy, said phi mapping fora rule r_(i) in one policy computed fromΦ(r _(i))={r _(j) |S _(rule)(r _(i) , r _(j))≧ε}, for each r_(j) in thesame effect category of the other policy, wherein S_(rule)(r_(i),r_(j))represents the rule similarity score for rules r_(i), r_(j), and ε is apredetermined threshold.
 24. The method of claim 15, wherein if anelement set is empty for one rule, setting the element similarity scoreto 0.5.
 25. A program storage device readable by a computer, tangiblyembodying a program of instructions executable by the computer toperform the method steps for determining similarity of two accesscontrol policies, said method comprising the steps of: providing a firstpolicy with n rules; providing a second policy with m rules;categorizing the rules in each policy based on effect, wherein saidrules are categorized as either permit rules or deny rules; calculatinga rule similarity score for each permit rule in said first policy witheach permit rule in said second policy; calculating a rule similarityscore for each deny rule in said first policy with each deny rule insaid second policy; for each rule in each policy, calculating a rule-setsimilarity score between said rule and the rules of similar effect inthe other policy; averaging the rule-set similarity scores for allpermit rules; averaging the rule-set similarity scores for all denyrules; and calculating a policy similarity score from a weighted sum ofthe average permit rule-set similarity score and the average denyrule-set similarity score, wherein said policy similarity score isindicative of the similarity of said first and second policies, whereineach said policy is a set of rules for determining access to and use ofresources in an information system.
 26. A program storage devicereadable by a computer, tangibly embodying a program of instructionsexecutable by the computer to perform the method steps for determiningsimilarity of two access control policies, said method comprising thesteps of: providing a first policy with n rules and a second policy withm rules, wherein each rule comprises a plurality of elements, includinga subject element, a resource element, an action element, and acondition element that determines an effect of said rule, wherein eachelement is represented as a set of predicates in the form of{attr_name₁⊕₁attr_value₁,attr_name₂⊕₂attr_value₂, . . . }, whereattr_name denotes an attribute name, ⊕ denotes a comparison operator andattr_value represents an attribute value, wherein attribute valuesinclude categorical values or numerical values; categorizing the rulesin each policy based on effect, wherein said rules are categorized aseither permit rules or deny rules; for each rule r_(i) in said firstpolicy and each rule r_(j) in said second policy of similar effect,finding those predicates in said pair of rules whose attribute namesmatch, wherein a match is either a syntactic match or a synonym; foreach predicate with matching attribute names, computing an attributesimilarity score for the attribute values; summing the attributesimilarity scores for all pairs of matching predicates to obtain anelement similarity score; and computing a rule similarity scoreS_(rule)(r_(i), r_(j)) for said pair of rules from a weighted sum ofsaid element similarity scores, wherein said rule similarity score isindicative of the similarity of said first and second policies, whereineach said policy is a set of rules for determining access to and use ofresources in an information system.
 27. A computer-implemented methodfor determining similarity of two policies, the method performed by thecomputer comprising the steps of: providing a first policy with n rulesand a second policy with m rules, wherein each rule is structured into aplurality of identifiable elements, including a event element, acondition element, and an action element, wherein each element isrepresented as a set of predicates in the form of{attr_name₁⊕₁attr_value₁,attr_name₂⊕₂attr_value₂, . . . }, whereattr_name denotes an attribute name, ⊕ denotes a comparison operator andattr_value represents an attribute value; categorizing the rules in eachpolicy based on action; for each rule r_(i) in said first policy andeach rule r_(j) in said second policy of similar effect, finding thosepredicates in said pair of rules whose attribute names match, wherein amatch is either a syntactic match or a synonym; for each predicate withmatching attribute names, computing an attribute similarity score forthe attribute values; summing the attribute similarity scores for allpairs of matching predicates to obtain an element similarity score; andcomputing a rule similarity score S_(rule)(r_(i), r_(j)) for said pairof rules from a weighted sum of said element similarity scores, whereinsaid rule similarity score is indicative of the similarity of said firstand second policies, wherein each said policy is a set of rules fordetermining access to and use of resources in an information system. 28.The method of claim 27, wherein categorizing the rules in each policybased on action comprises: for each rule r_(i) in said first policy andeach rule r_(j) in said second policy, finding those action predicatesin said pair of rules whose attribute names match, wherein a match iseither a syntactic match or a synonym; for each predicate with matchingattribute names, computing an attribute similarity score for theattribute values; summing the attribute similarity scores for all pairsof matching predicates to obtain an action similarity score, whereinsaid rule is categorized based on said action similarity score.